D’Agostino’s \(K^2\) normality test (dagostino_k2)#
D’Agostino–Pearson’s \(K^2\) test is an omnibus normality test: it looks for departures from normality due to skewness and/or kurtosis.
Learning goals#
By the end you should be able to:
explain what \(K^2\) tests (and what it does not test)
compute skewness and kurtosis and connect them to distribution shape
implement the full test in NumPy only (low-level)
use Plotly visuals (histogram, Q–Q) to understand why normality fails
interpret the statistic and p-value responsibly
Table of contents#
What the test is for
Hypotheses + outputs
Intuition: skewness + kurtosis
NumPy-only implementation
Visual diagnostics
Geometry + chi-square interpretation
Interpreting the result
Pitfalls + guidance
Exercises + references
import math
import numpy as np
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
1) What the test is for#
You typically use D’Agostino’s \(K^2\) test when you want a quick, automated check of whether a sample is plausibly normal.
Common workflows:
checking normality of residuals (linear regression / ANOVA diagnostics)
choosing between parametric vs non-parametric tests
deciding whether a normal noise model is a reasonable approximation
validating simulation outputs that are supposed to be normal
What it does not do:
it does not tell you which non-normal distribution you have
it does not “prove normality” when the p-value is large
2) Hypotheses + outputs#
\(H_0\) (null): the data are a random sample from a normal distribution (some \(\mu, \sigma\)).
\(H_1\) (alternative): the data are not from a normal distribution.
The test returns:
K2: an omnibus statistic (larger → more evidence against normality)p_value: the upper-tail probability \(P(K2_{H0} \ge K2_{obs})\)
Rules of thumb:
if
p_value < α(e.g. 0.05), you reject normality at level αif
p_value ≥ α, you fail to reject (not the same as “accept normality”)
Sample size notes:
most references (and SciPy) require n ≥ 8
the kurtosis transform is asymptotic and becomes more reliable as n grows (you’ll often see guidance like “n > 20”)
3) Intuition: skewness + kurtosis#
\(K^2\) combines two shape diagnostics:
Skewness (3rd standardized moment): asymmetry
positive skew: long right tail
negative skew: long left tail
Kurtosis (4th standardized moment): tail heaviness / peakedness
normal has Pearson kurtosis = 3
heavier tails → larger kurtosis; lighter tails → smaller kurtosis
\(K^2\) is “omnibus” because it can reject normality if either skewness or kurtosis (or both) look unusual under \(H_0\).
4) NumPy-only implementation (low-level)#
We implement the same basic steps as scipy.stats.normaltest:
compute sample skewness and kurtosis
transform each into an approximately standard-normal z-score
combine them: \(K^2 = z_{skew}^2 + z_{kurt}^2\)
compute the p-value from \(\chi^2(2)\)
def _as_1d_finite(x):
x = np.asarray(x, dtype=float).ravel()
return x[np.isfinite(x)]
def _central_moments_1d(x):
x = _as_1d_finite(x)
mean = x.mean()
d = x - mean
m2 = np.mean(d**2)
m3 = np.mean(d**3)
m4 = np.mean(d**4)
return mean, m2, m3, m4
def sample_skewness(x):
_, m2, m3, _ = _central_moments_1d(x)
if m2 == 0:
return np.nan
return m3 / (m2 ** 1.5)
def sample_kurtosis_pearson(x):
_, m2, _, m4 = _central_moments_1d(x)
if m2 == 0:
return np.nan
return m4 / (m2 ** 2) # normal → 3
def dagostino_skew_z(x):
x = _as_1d_finite(x)
n = x.size
if n < 8:
raise ValueError(f'dagostino_skew_z requires n >= 8; got n={n}.')
g1 = sample_skewness(x)
if not np.isfinite(g1):
return np.nan
y = g1 * np.sqrt(((n + 1) * (n + 3)) / (6.0 * (n - 2)))
beta2 = (
3.0
* (n**2 + 27 * n - 70)
* (n + 1)
* (n + 3)
/ ((n - 2.0) * (n + 5) * (n + 7) * (n + 9))
)
W2 = -1.0 + np.sqrt(2.0 * (beta2 - 1.0))
delta = 1.0 / np.sqrt(0.5 * np.log(W2))
alpha = np.sqrt(2.0 / (W2 - 1.0))
# Exactly-zero y can happen for perfectly symmetric toy data; match common implementations.
if y == 0.0:
y = 1.0
z = delta * np.log(y / alpha + np.sqrt((y / alpha) ** 2 + 1.0))
return z
def anscombe_glynn_kurt_z(x):
x = _as_1d_finite(x)
n = x.size
if n < 5:
raise ValueError(f'anscombe_glynn_kurt_z requires n >= 5; got n={n}.')
b2 = sample_kurtosis_pearson(x)
if not np.isfinite(b2):
return np.nan
E = 3.0 * (n - 1) / (n + 1)
varb2 = 24.0 * n * (n - 2) * (n - 3) / ((n + 1) ** 2 * (n + 3) * (n + 5))
x_stat = (b2 - E) / np.sqrt(varb2)
sqrtbeta1 = (
6.0
* (n * n - 5 * n + 2)
/ ((n + 7) * (n + 9))
* np.sqrt((6.0 * (n + 3) * (n + 5)) / (n * (n - 2) * (n - 3)))
)
A = 6.0 + 8.0 / sqrtbeta1 * (2.0 / sqrtbeta1 + np.sqrt(1.0 + 4.0 / (sqrtbeta1**2)))
term1 = 1.0 - 2.0 / (9.0 * A)
denom = 1.0 + x_stat * np.sqrt(2.0 / (A - 4.0))
if denom == 0.0:
return np.nan
term2 = np.sign(denom) * (((1.0 - 2.0 / A) / np.abs(denom)) ** (1.0 / 3.0))
z = (term1 - term2) / np.sqrt(2.0 / (9.0 * A))
return z
def dagostino_k2_test(x):
x = _as_1d_finite(x)
n = x.size
if n < 8:
raise ValueError(f'dagostino_k2_test requires n >= 8; got n={n}.')
g1 = sample_skewness(x)
b2 = sample_kurtosis_pearson(x)
z_skew = dagostino_skew_z(x)
z_kurt = anscombe_glynn_kurt_z(x)
K2 = z_skew**2 + z_kurt**2
p_value = float(np.exp(-0.5 * K2)) # chi-square(df=2) survival function
return {
'n': int(n),
'skewness': float(g1),
'kurtosis_pearson': float(b2),
'z_skew': float(z_skew),
'z_kurt': float(z_kurt),
'K2': float(K2),
'p_value': p_value,
}
Quick example: inspect the components#
The two component z-scores tell you why the omnibus statistic is large:
large
|z_skew|→ asymmetry (skewed distribution)large
|z_kurt|→ tails/peakedness differ from normal
x = rng.normal(0, 1, size=200)
ours = dagostino_k2_test(x)
ours
# Optional SciPy check (if installed)
try:
from scipy.stats import normaltest
sp = normaltest(x)
{"scipy_K2": float(sp.statistic), "scipy_p_value": float(sp.pvalue)}
except Exception as e:
f"SciPy not available: {e}"
5) Visual diagnostics#
Normality tests reduce a full distribution to a single number, so you should almost always pair \(K^2\) with:
a histogram + fitted normal curve
a Q–Q plot (straight line suggests normality; curvature suggests tails/skew)
def normal_pdf(x, mu=0.0, sigma=1.0):
x = np.asarray(x, dtype=float)
return (1.0 / (sigma * np.sqrt(2.0 * np.pi))) * np.exp(-0.5 * ((x - mu) / sigma) ** 2)
def norm_ppf(p):
"""Acklam's rational approximation of the standard normal quantile (NumPy-only)."""
p = np.asarray(p, dtype=float)
if np.any((p <= 0) | (p >= 1)):
raise ValueError('p must be in (0, 1)')
a = np.array([
-3.969683028665376e01,
2.209460984245205e02,
-2.759285104469687e02,
1.383577518672690e02,
-3.066479806614716e01,
2.506628277459239e00,
])
b = np.array([
-5.447609879822406e01,
1.615858368580409e02,
-1.556989798598866e02,
6.680131188771972e01,
-1.328068155288572e01,
])
c = np.array([
-7.784894002430293e-03,
-3.223964580411365e-01,
-2.400758277161838e00,
-2.549732539343734e00,
4.374664141464968e00,
2.938163982698783e00,
])
d = np.array([
7.784695709041462e-03,
3.224671290700398e-01,
2.445134137142996e00,
3.754408661907416e00,
])
plow = 0.02425
phigh = 1.0 - plow
x = np.empty_like(p)
m1 = p < plow
if np.any(m1):
q = np.sqrt(-2.0 * np.log(p[m1]))
num = (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
den = ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
x[m1] = num / den
m2 = (p >= plow) & (p <= phigh)
if np.any(m2):
q = p[m2] - 0.5
r = q * q
num = (((((a[0] * r + a[1]) * r + a[2]) * r + a[3]) * r + a[4]) * r + a[5]) * q
den = (((((b[0] * r + b[1]) * r + b[2]) * r + b[3]) * r + b[4]) * r + 1.0)
x[m2] = num / den
m3 = p > phigh
if np.any(m3):
q = np.sqrt(-2.0 * np.log(1.0 - p[m3]))
num = (((((c[0] * q + c[1]) * q + c[2]) * q + c[3]) * q + c[4]) * q + c[5])
den = ((((d[0] * q + d[1]) * q + d[2]) * q + d[3]) * q + 1.0)
x[m3] = -(num / den)
return x
def qq_points_standardized(x):
x = _as_1d_finite(x)
x = (x - x.mean()) / x.std(ddof=0)
xs = np.sort(x)
n = xs.size
p = (np.arange(1, n + 1) - 0.5) / n
z = norm_ppf(p)
return z, xs
def add_hist_with_normal_fit(fig, x, row, col, title):
x = _as_1d_finite(x)
mu = x.mean()
sigma = x.std(ddof=0)
grid = np.linspace(x.min(), x.max(), 400)
fig.add_trace(
go.Histogram(
x=x,
nbinsx=45,
histnorm='probability density',
opacity=0.65,
name=title,
),
row=row,
col=col,
)
fig.add_trace(
go.Scatter(
x=grid,
y=normal_pdf(grid, mu=mu, sigma=sigma),
mode='lines',
name='Normal fit',
),
row=row,
col=col,
)
fig.update_xaxes(title_text='x', row=row, col=col)
fig.update_yaxes(title_text='density', row=row, col=col)
def add_qq(fig, x, row, col):
z, xs = qq_points_standardized(x)
lo = float(min(z.min(), xs.min()))
hi = float(max(z.max(), xs.max()))
fig.add_trace(
go.Scatter(x=z, y=xs, mode='markers', marker=dict(size=5), name='Q–Q points'),
row=row,
col=col,
)
fig.add_trace(
go.Scatter(x=[lo, hi], y=[lo, hi], mode='lines', name='45° line'),
row=row,
col=col,
)
fig.update_xaxes(title_text='Theoretical quantiles (N(0,1))', row=row, col=col)
fig.update_yaxes(title_text='Sample quantiles (standardized)', row=row, col=col)
n = 400
x_normal = rng.normal(0, 1, size=n)
x_logn = rng.lognormal(mean=0.0, sigma=0.6, size=n)
x_mix = np.concatenate([rng.normal(-2, 1, size=n // 2), rng.normal(2, 1, size=n - n // 2)])
datasets = {'normal': x_normal, 'lognormal': x_logn, 'mixture': x_mix}
titles = []
for name, x in datasets.items():
res = dagostino_k2_test(x)
titles.extend([f"{name}: hist (p={res['p_value']:.3g})", f"{name}: Q–Q"])
fig = make_subplots(rows=3, cols=2, subplot_titles=titles, horizontal_spacing=0.12, vertical_spacing=0.12)
for i, (name, x) in enumerate(datasets.items(), start=1):
add_hist_with_normal_fit(fig, x, row=i, col=1, title=f'{name} sample')
add_qq(fig, x, row=i, col=2)
fig.update_layout(height=950, showlegend=False, title_text='Histogram + Q–Q (with K² p-values)')
fig.show()
6) Geometry + chi-square interpretation#
The test first turns skewness and kurtosis into two approximately standard-normal z-scores:
\(z_{skew}\)
\(z_{kurt}\)
and then combines them: $\(K^2 = z_{skew}^2 + z_{kurt}^2\)$
So you can think of \(K^2\) as a squared distance from the origin in the plane \((z_{skew}, z_{kurt})\).
Under \(H_0\), \(K^2\) is approximately \(\chi^2\) with 2 degrees of freedom, and the p-value is: $\(p = P(\chi^2_2 \ge K^2) = \exp(-K^2/2)\)$
def dagostino_components_batch(samples):
"""Return (z_skew, z_kurt) for many samples at once (samples shape: [sims, n])."""
samples = np.asarray(samples, dtype=float)
n = samples.shape[1]
if n < 8:
raise ValueError('need n >= 8')
mean = samples.mean(axis=1, keepdims=True)
d = samples - mean
m2 = np.mean(d**2, axis=1)
m3 = np.mean(d**3, axis=1)
m4 = np.mean(d**4, axis=1)
g1 = m3 / (m2 ** 1.5)
b2 = m4 / (m2 ** 2)
# skew z
y = g1 * np.sqrt(((n + 1) * (n + 3)) / (6.0 * (n - 2)))
beta2 = (
3.0
* (n**2 + 27 * n - 70)
* (n + 1)
* (n + 3)
/ ((n - 2.0) * (n + 5) * (n + 7) * (n + 9))
)
W2 = -1.0 + np.sqrt(2.0 * (beta2 - 1.0))
delta = 1.0 / np.sqrt(0.5 * np.log(W2))
alpha = np.sqrt(2.0 / (W2 - 1.0))
y = np.where(y == 0.0, 1.0, y)
z_skew = delta * np.log(y / alpha + np.sqrt((y / alpha) ** 2 + 1.0))
# kurt z
E = 3.0 * (n - 1) / (n + 1)
varb2 = 24.0 * n * (n - 2) * (n - 3) / ((n + 1) ** 2 * (n + 3) * (n + 5))
x_stat = (b2 - E) / np.sqrt(varb2)
sqrtbeta1 = (
6.0
* (n * n - 5 * n + 2)
/ ((n + 7) * (n + 9))
* np.sqrt((6.0 * (n + 3) * (n + 5)) / (n * (n - 2) * (n - 3)))
)
A = 6.0 + 8.0 / sqrtbeta1 * (2.0 / sqrtbeta1 + np.sqrt(1.0 + 4.0 / (sqrtbeta1**2)))
term1 = 1.0 - 2.0 / (9.0 * A)
denom = 1.0 + x_stat * np.sqrt(2.0 / (A - 4.0))
term2 = np.sign(denom) * np.where(
denom == 0.0,
np.nan,
((1.0 - 2.0 / A) / np.abs(denom)) ** (1.0 / 3.0),
)
z_kurt = (term1 - term2) / np.sqrt(2.0 / (9.0 * A))
return z_skew, z_kurt
n = 60
sims = 2500
samples_normal = rng.normal(0, 1, size=(sims, n))
samples_logn = rng.lognormal(mean=0.0, sigma=0.6, size=(sims, n))
samples_t3 = rng.standard_t(df=3, size=(sims, n))
# symmetric but non-normal: 0.5*N(-2,1) + 0.5*N(2,1)
sign = rng.choice([-1.0, 1.0], size=(sims, n))
samples_mix = rng.normal(loc=2.0, scale=1.0, size=(sims, n)) * sign
z1_n, z2_n = dagostino_components_batch(samples_normal)
z1_l, z2_l = dagostino_components_batch(samples_logn)
z1_t, z2_t = dagostino_components_batch(samples_t3)
z1_m, z2_m = dagostino_components_batch(samples_mix)
alpha = 0.05
k2_thresh = -2.0 * np.log(alpha)
r = float(np.sqrt(k2_thresh))
theta = np.linspace(0, 2 * np.pi, 400)
circle_x = r * np.cos(theta)
circle_y = r * np.sin(theta)
fig = go.Figure()
fig.add_trace(go.Scatter(x=z1_n, y=z2_n, mode='markers', name='normal', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=z1_l, y=z2_l, mode='markers', name='lognormal', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=z1_t, y=z2_t, mode='markers', name='t(df=3)', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=z1_m, y=z2_m, mode='markers', name='mixture', opacity=0.35, marker=dict(size=4)))
fig.add_trace(go.Scatter(x=circle_x, y=circle_y, mode='lines', name=f'reject @ α={alpha}', line=dict(color='black', dash='dash')))
fig.update_layout(
title=f'Z-space view: K² = z_skew² + z_kurt² (n={n}, sims={sims} per distribution)',
xaxis_title='z_skew',
yaxis_title='z_kurt',
width=900,
height=650,
)
fig.update_yaxes(scaleanchor='x', scaleratio=1)
fig.show()
7) Under the null: \(K^2 \sim \chi^2(2)\) and p-values are uniform#
A good self-check is to simulate normal samples and confirm:
\(K^2\) matches the \(\chi^2(2)\) shape
p-values look approximately Uniform(0,1)
n = 80
sims = 12000
samples = rng.normal(0, 1, size=(sims, n))
z_skew, z_kurt = dagostino_components_batch(samples)
K2 = z_skew**2 + z_kurt**2
pvals = np.exp(-0.5 * K2)
grid = np.linspace(0.0, float(np.quantile(K2, 0.99)), 400)
chi2_pdf = 0.5 * np.exp(-0.5 * grid) # df=2
fig1 = go.Figure()
fig1.add_trace(go.Histogram(x=K2, nbinsx=60, histnorm='probability density', opacity=0.65, name='simulated K²'))
fig1.add_trace(go.Scatter(x=grid, y=chi2_pdf, mode='lines', name='chi-square(df=2) pdf'))
fig1.update_layout(title='K² under H0 ≈ chi-square(df=2)', xaxis_title='K²', yaxis_title='density', barmode='overlay')
fig1.show()
fig2 = go.Figure()
fig2.add_trace(go.Histogram(x=pvals, nbinsx=50, histnorm='probability density', opacity=0.65, name='p-values'))
fig2.add_trace(go.Scatter(x=[0, 1], y=[1, 1], mode='lines', name='Uniform(0,1) density', line=dict(color='black', dash='dash')))
fig2.update_layout(title='p-values under H0 should be ~Uniform(0,1)', xaxis_title='p-value', yaxis_title='density', barmode='overlay')
fig2.show()
8) Interpreting the result (what it means)#
p_valueis not “the probability the data are normal”.It is: assuming normality, how surprising is a \(K^2\) at least this large?
Small
p_value→ evidence that skewness and/or kurtosis are inconsistent with a normal sample.Large
p_value→ you did not find strong evidence against normality (but normality can still be false).
Use the components:
large
|z_skew|→ asymmetrylarge
|z_kurt|→ tails/peakedness differ from normal
Reporting example:
D’Agostino \(K^2\) normality test, n=200: \(K^2\)=2.40, p=0.30.
With large n, rejection can be “statistically significant” but practically irrelevant — always inspect the Q–Q plot.
9) Pitfalls + guidance#
Large n: tiny, practically irrelevant deviations can produce tiny p-values (high power).
Outliers: a few extreme points can dominate skewness/kurtosis and trigger rejection.
Dependence: the test assumes i.i.d. samples; autocorrelation (time series) can invalidate p-values.
Discrete/rounded data: many ties can cause deviations even when a normal model is a decent approximation.
Small n: the approximations are rough; consider Shapiro–Wilk for very small samples and always inspect Q–Q plots.
Diagnostics checklist:
for regression diagnostics, test residuals
inspect
z_skewvsz_kurtto see why it rejectsif normality matters for inference, consider robust or resampling-based alternatives
10) Exercises#
Estimate the empirical Type I error at α=0.05 by simulating many normal samples.
Compare power against lognormal, t(3), and a Gaussian mixture as you vary n.
Fit a linear regression, compute residuals, and run \(K^2\) on the residuals. Does it match your Q–Q plot?
Apply a transform (log, Box–Cox) to a skewed sample and compare
z_skew,z_kurt, andp_value.
References#
D’Agostino, R. B. (1971). An omnibus test of normality for moderate and large sample size. Biometrika.
D’Agostino, R. B. and Pearson, E. S. (1973). Tests for departure from normality. Biometrika.
Anscombe, F. J. and Glynn, W. J. (1983). Distribution of the kurtosis statistic b2 for normal samples. Biometrika.
SciPy documentation:
scipy.stats.normaltest,skewtest,kurtosistest.